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A quasispecies is a set of interrelated genotypes that have reached a situation of equilibrium while 
evolving according to the usual Darwinian principles of selection and mutation. Quasispecies studies 
invariably assume that it is possible for any genotype to mutate into any other, but recent finds 
indicate that this assumption is not necessarily true. Here we revisit the traditional quasispecies 
theory by adopting a network structure to constrain the occurrence of mutations. Such structure 
is governed by a random-graph model, whose single parameter (a probability p) controls both the 
graph's density and the dynamics of mutation. We contribute two further modifications to the 
theory, one to account for the fact that different loci in a genotype may be differently susceptible to 
the occurrence of mutations, the other to allow for a more plausible description of the transition from 
adaptation to degeneracy of the quasispecies as p is increased. We give analytical and simulation 
results for the usual case of binary genotypes, assuming the fitness landscape in which a genotype's 
fitness decays exponentially with its Hamming distance to the wild type. These results support the 
theory's assertions regarding the adaptation of the quasispecies to the fitness landscape and also its 
possible demise as a function of p. 

PACS numbers: 87.23.Kg, 89.75.Fb, 02.10.Ox, 02.50.-r 



I. INTRODUCTION 

The concept of a quasispecies was introduced by Eigen 
and Schuster P, Q to describe the equilibrium state of 
a population of genotypes whose members mutate fre- 
quently into one another while replicating without recom- 
bination (i.e., asexually). At first the theory targeted the 
dynamics of complex, prebiotic molecules and aimed to 
explain the phenomena of self-organization and adapt- 
ability that led to the appearance of life. Today, how- 
ever, the quasispecies theory is thought to be much more 
widely applicable, as to the dynamics of RNA viruses and 
in cancer research [sj , in fact providing interesting insight 
into the dynamics of any population of genotypes, includ- 
ing those that replicate with recombination and mutate 
relatively infrequently Q- 

The theory combines the evolutionary principles of se- 
lection and mutation to describe the dynamics of a pop- 
ulation of genotypes, and in this sense constitutes the 
leading manifestation of the Darwinian principles at the 
molecular level. Its central tenet is that, although each 
individual genotype can be ascribed a fitness that is a 
function of its replicative capacity, the actual fitness ef- 
fects (ranging, e.g., from strongly deleterious to highly 
adaptive [5-7]) are a property of the population rather 
than of the genotype [8| . As we observe the dynamics of 
the population relative to the so-called fitness landscape 
(i.e., the fitnesses of all possible genotypes), selection op- 
erates on the entire population and can guide it toward 
the landscape's peaks. In other words, even though the 



process of mutation remains essentially stochastic, the 
population can in fact infiuence it because the fittest 
genotypes will replicate more and lead the population 
to adapt to the fitness landscape. 

In the particular case of RNA viruses, and notwith- 
standing some degree of controversy over how applicable 
the quasispecies theory is to their dynamics (cf., e.g., 
d, [13] and more recently d, [TH), the array of implica- 
tions to the understanding of viral diseases is notable. 
For example, the theory suggests that the fitness effects 
of a virus population are determined more by how free 
its various genotypes are to mutate than by how capable 
they are to replicate. Another implication seems to be 
that, paradoxically, increasing the genotypes' error rates 
during replication may render the virus less pathogenic 

mm 

The centerpiece of the quasispecies theory is the so- 
called quasispecies equation, which for each possible 
genotype gives the rate at which the genotype's relative 
abundance varies with time in terms of all genotypes' 
abundances, their fitnesses, and the rates at which geno- 
types mutate into one another. We refer the reader to 
[ij, [iH] , and references therein, for a summary of the cus- 
tomary assumptions and known developments. Normally 
a genotype is represented as a length-i string of O's and 
I's, so the number of genotypes in the population is 2^. 
Every genotype can mutate into every other, so essen- 
tially there is no structure constraining the occurrence of 
mutations. Moreover, in general one assumes that mu- 
tations can be modeled as occurring independently at 
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each of a genotype's loci with the same probabihty u for 
each locus (a notable exception here is the study in [l6| . 
where loci having different mutation rates are allowed, as 
are mutations of two or three adjacent loci as a g roup, in 
recognition of the plausibility of such events jlThlQ,] ) . 

In addition to the quasispecies itself, which is charac- 
terized by the genotypes' relative abundances at equilib- 
rium, another important observable in the theory is the 
so-called error threshold, which refers to how variations 
in the point mutation rate u determine the population's 
average fitness at equilibrium. The customary approach 
to determine this threshold is to concentrate on the rel- 
ative abundance of the fittest genotype, normally called 
the wild (or master) type, and study how its eventual 
survival depends on u. Invariably such studies have as- 
sumed that no genotype can mutate into the wild type 
and solved the resulting, simplified version of the quasis- 
pecies equation for the minimum value of u that ensures 
that the wild type survives. This threshold value is a 
function of the wild type's fitness and of the length L 

Here we revisit the quasispecies theory by seeking to 
attenuate what we perceive to be three main sources of 
biological implausibility. The first one is related to the 
total lack of structure constraining the possible muta- 
tions inside the population. Recent finds indicate, to the 
contrary, that for some organisms not every combination 
of loci can be involved in a single mutation out of a spe- 
cific genotype [l^l- The second one has to do with the 
nearly ubiquitous assumption that genotypes are equally 
likely to undergo a mutation at any locus. In this case, 
too, there is evidence in support of locus-dependent mu- 
tation rates |18| even though mutations do seem to occur 



simultaneously at different, not necessarily contiguous, 

loci mi. 

We tackle these first two issues by adopting a suscep- 
tibility model to differentiate one locus from another as 
far as the occurrence of mutations at those loci is con- 
cerned. The susceptibility of a specific locus i is any 
positive number si that gets larger as genotypes become 
more susceptible to the occurrence of a mutation at locus 
£. Given two genotypes i and j that differ at locus I and 
a probability parameter p, we use p^/'** both to create 
a random-graph model to give structure to the evolving 
population in terms of whether i and j can mutate into 
each other and to govern the dynamics of mutation if they 
can. Additionally, note that by adopting a random-graph 
model into the quasispecies theory we are also providing 
the theory with a perspective that connects it with the 
decade-long effort to understand the so-called complex 
networks and their applications j22| - [2l |. 

Our third perceived source of implausibility comes 
from the assumptions that underlie the common method 
to determine the error threshold. Such assumptions are 
too stringent (no genotype mutates into the wild type) 
and result in a strict threshold separating the survival 
of the wild type in the quasispecies from its catastrophic 
demise. Rather, as suggested by the study in [25| and the 



review in [13| , we believe it might be more plausible if the 
two regimes were separated by a wider interval of the con- 
trol parameter in our case), over which the transition 
could occur more smoothly. In order to avoid the same 
stringent assumptions that have dominated such studies 
so far, we start by assuming instead that a genotype's rel- 
ative abundance in the quasispecies depends on its fitness 
as a power law. The accuracy of this assumption depends 
on the susceptibilities of the various loci, but in the cases 
we investigate it allows the average fitness of the qua- 
sispecies to be expressed analytically and the transition 
between degeneracy and survival to occur smoothly. 

We proceed in the following manner, assuming that 
genotypes are binary (as usual) and also that a geno- 
type's fitness decays exponentially with its Hamming dis- 
tance to the wild type. First we introduce our model in 
Sec. ini where we rewrite the quasispecies equation for 
the case of network-constrained mutations and, for two 
distinct susceptibility scenarios, solve it approximately 
under the assumption that a genotype's relative abun- 
dance and fitness are related by a power law. Then we 
give computational results in Sec. IIIII and also discuss 
the conditions for our analytical expressions to be good 
approximations to the simulation data. We conclude in 
Sec.lTVl 



II. MODEL 

We consider binary genotypes of length L, that is, 
length-L sequences of O's and I's. There are thus n — 2^ 
different genotypes, numbered l,2,...,n. We assume 
that genotype 1 comprises only O's. The fitness of geno- 
type i reflects its replication rate and here is given by 
fi — 2^'^', where di is the number of I's in the geno- 
type. That is, a genotype's fitness decays exponentially 
with its Hamming distance to genotype 1 (which is then 
the fittest one, with /i = 1, or wild type). While this 
choice seems reasonable, it is by no means the only pos- 
sibility and many other alternatives might be considered. 
We note, however, that adopting an exponential function 
has allowed many of the analytical calculations that we 
present in this section to be performed. 

We assume that the n genotypes are the nodes of a 
directed graph D with self-loops at all nodes. The set 
of in-neighbors of node i in D is denoted by li and its 
set of out-neighbors by Oi. It holds that both z G /i and 
i e Oi. The existence of an edge directed from node i 
to node j ^ i means that it is possible for genotype i to 
mutate into genotype j during replication. This happens 
with probability qij. Letting qu be the probability that 
genotype i remains unchanged during replication leads 

Let Xi denote the abundance of genotype i at any given 
time, and similarly let Xi = ^i/X]fe=i -^k be its relative 
abundance. The time derivative of Xi depends on the 
abundance of all genotypes in li (i.e., i itself and those 
that can mutate into i during replication) in such a way 
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that 



Xi fjqjiXj 

J 6 A 



Rewriting for Xi yields 



fjQji-^j 4^^i 



(1) 



(2) 



where = X]fe=i /fe^fc is the average fitness of all n geno- 
types. Equation ([2|) is the well-known quasispecies equa- 
tion, now written for graph D. 

In our model, both the structure of graph D and the 
dynamics of mutation depend on how susceptible each 
of the L loci in a genotype is to undergo a mutation. 
For £ = 1, 2, . . . , L, we let si be a positive number that 
grows with the susceptibility that a genotype undergoes 
a mutation at locus £, the same for all genotypes. Thus, 
an edge exists in graph D directed from genotype i to 
genotype j with probability pij such that 



Pij = P 



(3) 



where p is a probability parameter and hi — 1 it and 
only if the two genotypes differ at locus £ {h^ — 0, other- 
wise) . Note that this definition of pij is consistent with 
the mandatory existence of self- loops at all nodes of D, 
since for j — i we have hi = for all i and thus pu = 1. 
If the edge from i to j does exist, the probability qij that 
i mutates into j (or remains unchanged, if j — i) is pro- 
portional to Pij, i.e., Qij = Pij/Zi, where Zi = J2keo, P^k 
is a normalizing constant for genotype i. 

Henceforth we work on the hypothesis that, at equi- 
librium, Xi depends on the fitness fi as a power law for 
every genotype i. That is, we assume that Xi — bf^ for 
suitable a > when Xi = 0. Such functional dependency 
turns up in some of the cases we study (cf. Sec. IIII|) and, 
furthermore, facilitates some of the analytical calcula- 
tions that we carry out in this section. It immediately 
follows that the equilibrium value of the average fitness 
is0 = 6ELo (^)2"('^+l)^ yielding 



= b 1 + 2-^''+^ 



(4) 



Moreover, from the constraint Xi = 1 we obtain 



-ah 



1, whence 



5 = (1 + 2-")- 



(5) 



We estimate the value of a by resorting to a mean-field 
version of Eq. ©, that is, one in which the expected 
contribution of every genotype j to Xi (not only those in 
li) is taken into account and occurs according to the ex- 
pected value of the mutation probability qji of genotype 
j into genotype i. By definition, mutation in this case oc- 
curs with probability proportional to pji , provided graph 



D contains an edge directed from j to i. The latter hap- 
pens with probability Pji as well, so the expected value 
of qji is p]i/J2k=iP]k- Equation ([2]) then becomes 



fjP'ji^j 
--lP% 



(6) 



Our estimate of a comes from considering the wild type 
at equilibrium, that is, from imposing xi — in Eq. ([5]) 
and solving the resulting equation. 



n „2 9-(a-|-l)o 
^P_^ 



J2k=i p% 



1 + 2-" 



= 0. 



(7) 



We study two susceptibility scenarios. The first one, 
henceforth referred to as the uniform case, sets si = 1 
for every locus £. In this case, it follows that X^fci ^^/^e 
in Eq. ([3]) is the Hamming distance between genotypes 
i and j, here denoted by Hij, and therefore pij = p^^^ . 
The summation on k appearing in Eq. ^ becomes 



n L 
,P% 



fe=l 



/i=0 



{l+P') 



2\L 



(8) 



for any j and the summation on j, since pji = p'^\ can 
be similarly written as a sum on the possible values h of 
the Hamming distance dj to the wild type: 



n 
J = l 



L 

E 

h=0 



2hn-{a+l)h 



This yields 



whence 



2" 



1 + ^/TT'. 

4p2 



(9) 



(10) 



(11) 



so in the uniform case the value of the power-law expo- 
nent a does not depend on L. For sufficiently small p, 
we can write 2° « l/2p^, which by Eqs. dH) and ^ al- 
lows the equilibrium value of </), in the uniform case, to 
be approximated by 



_1+Pl 
l + 2p2 



(12) 



for large L. 

In the second susceptibility scenario, which we hence- 
forth refer to as the inverse-decay case, we have si = l/£ 
for locus £. While this specific form for the dependency 
of Si on is totally arbitrary and seems to carry no spe- 
cial biological meaning, it has been our choice because 
it is simple and has proven amenable to a certain de- 
gree of analytical manipulation. It this case it follows 
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that X^fci hi/si = X^fci hti in Eq. ([3]), which is the sum 
of every ^ such that genotypes i and j differ at locus 
Denoting this sum by yields = p^^' . Now the 
summation on k appearing in Eq. ((T]) becomes 

„ L(L+l)/2 L 

Y^pI^ Y: TiL,s)p'^^Y[il+p'') (13) 

fc=l s=0 £=1 

for any j, where T(L, s) is the number of genotypes that 
differ from genotype j in loci that sum up to s [2^ . The 
summation on j, in turn, depends on first recognizing 
that the collective contribution to it from all (^) nodes j 
whose Hamming distance to the wild type is dj — h for 
fixed h is proportional to 

L(L + l)/2 

2_(.+i), J2 n{L,s)p^^, (14) 

s=0 




P 

FIG. 1. (Color online) Approximation of f(L,p) by p^/L in 
the inverse-decay case. 



where Th{L,s) is the number of genotypes whose h I's 
are found at loci that sum up to s [l^l- While the sum- 
mation in this expression cannot be written in a simpler 
form, it can be shown that the average value of s over the 
(^) genotypes involved is {L + l)h/2 [Hi. We then ap- 
proximate that summation by (^)p'^"*'^-''', so once again 
the summation on j in Eq. ([7]) can be written as a sum 
on the possible values h of the Hamming distance dj be- 
tween the wild type and genotype j: 



h=0 



For f{L,p) such that ULiC^ + P'^^) = i'^ + f (L , p)]^ , this 
leads to 



l + /(L,p) 1 + 2-- ' 

and finally to 

4/(L,p) 



(16) 



V[l+p^+i-/(L,p)]2 + 8/(L ,^ 
4/(i,p) 



(17) 



For p < 0.2, we have found empirically that f{L,p) w 
p^/L (Fig.©, whence 2^^ « {I - p"^ / L) / {2p^ / L) for large 
L. It then follows from Eqs. (jlj and ([5]) that, in the 
inverse-decay case, the equilibrium value of (p can be ap- 
proximated by 



1 



1+pVl 



(18) 



III. RESULTS 

For fixed values of the length L and the probability 
parameter p, our results are based on generating 10"* in- 
dependent instances of graph D and solving Eq. ^ nu- 
merically for each instance. This is achieved by letting 
Xi = 1/n initially for i = 1, 2, . . . , n (i.e., the initial pop- 
ulation is uniform on all genotypes) and time-stepping 
the corresponding equations until X^iLi l-^*! 10~^. Be- 
cause this entails substantial computational effort, we 
limit ourselves to L = 10 and L = 14 (i.e., n = 1024 
and n = 16 384 distinct genotypes, respectively). 

The resulting relative abundances of the quasispecies 
are given in Fig. [5] as a function of the genotypes' fit- 
nesses. By definition there are in general several differ- 
ent genotypes of the same fitness, so in the figure we give 
the average relative abundance of all such genotypes. In 
the uniform case, these results reveal an average behavior 
of same-fitness genotypes that is in excellent agreement 
with the power-law assumption we made. Moreover, as 
indicated by Eq. ([TT|). the power-law exponent a does 
not depend on L, being a function of p exclusively. In 
the inverse-decay case, on the other hand, the power- 
law assumption is reasonable only for the highest fitness 
values. At these values, it is worth noting that the power- 
law exponent a as given by Eq. (jl7p behaves reasonably 
with respect to the data despite the approximation of 
the summation in Eq. by The reason 

for this is that, once these expressions get multiplied by 
2-(''+i)'' and summed up on /i, the results are dominated 
by the lowest h values, hence the highest fitnesses, and 
these are precisely the values at which the approximation 
works best [in fact, both the summation in Eq. (|14l) and 
its approximation yield 1 for /i = 0, since Tq{L, s) = 1 if 
s = and To{L, s) = otherwise]. 

Figure [5] also reveals how the dominance of the wild 
type in the population behaves as p is increased and mu- 
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FIG. 2. (Color online) Relative abundances at equilibrium. 
For each fitness 2"*^, nrhere h is one of the possible values of 
the Hamming distance to the wild type, data are averages over 
all genotypes that have that fitness and 10* independent 
instances of graph D. Lines refer to the power law of exponent 
a as given by Eq. pi|) in the uniform case or Eq. {TTJ in the 
inverse-decay case. 
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FIG. 3. (Color online) Relative abundance of the wild type 
at equilibrium. Data are averages over 10 
stances of graph D. Lines refer to x\ = bfi = 
by Eq. (|lip in the uniform case or Eq. H17p in the inverse 
decay case. 
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tations into ever more different genotypes begin to be 
both allowed by the structure of D and made more fre- 
quent during the dynamics. A clearer view into this is 
afforded by Fig. [3l where we show the relative abundance 
of the wild type in the quasispecies as a function of p. 
Clearly, in both the uniform and the inverse-decay cases 
there exist values of p beyond which the wild type gets 
diluted into the population just as all other genotypes do. 
This happens at higher values in the inverse-decay case, 
since the 1 /£ susceptibility for locus £ tends to discourage 
mutations at this locus for all but relatively small values 
of i despite increases in p. 

Figure [3] also illustrates how well the power-law expo- 
nent a in Eq. ([TT|) or ([T7|) does when we focus on the 
wild type across the entire range for p. While the agree- 
ment with the data is once again very good in the uniform 
case, in the inverse-decay case this holds only for roughly 
p < 0.2 or p > 0.9. As above, explaining this requires 
that we revisit the approximation of the summation in 
Eq. by (^)p^^^^''''- Specifically, as we sum the prod- 
uct of either quantity by 2~("+^)'' on h, sufficiently small 
values of p render the differences caused by the approxi- 
mation irrelevant. Similarly, for sufficiently large values 
of p the approximation is good across a wide range of h 
values, as shown in Fig. 

A better glimpse into wild-type survival comes from 
considering the average fitness (p of the quasispecies. This 
is depicted in Fig. [51 which clearly indicates that the 
transition from survival to degeneracy of the wild type 
occurs gradually, within roughly one order of magnitude 
of the parameter p as it is increased. In the figure we 
also display our analytical predictions for at equilib- 
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FIG. 4. (Color online) Comparison between the summation 
in Eq. (fH)) . here referred to as g{L,p,h), and (^)p'^"''^''' for 
p = 0.95. 



rium. These are given, through Eqs. ^ and ([S]), as func- 
tions of the power-law exponent a in Eqs. (ITT|) and (|17l) . 
The same observations on accuracy given above continue 
to apply. Figure [S] also contains the simpler approxi- 
mation of at equilibrium given by the Gaussians in 
Eqs. (fT2]) and (fTSj) . respectively for the uniform case and 
the inverse-decay case. As expected, these approxima- 
tions work very well for small values of p. The one for 
the uniform case tends to improve as L is increased. 
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FIG. 5. (Color online) Average fitness at equilibrium. Data 
are averages over 10* independent instances of graph D. Lines 
refer to Eqs. Q and ((5]) with a as given by Eq. in the 
uniform case or Eq. p7|l in the inverse-decay case, or to the 
Gaussian of Eq. (I12p in the uniform case or Eq. (|18p in the 
inverse-decay case. 



IV. CONCLUSIONS 

We have revisited the quasispecies theory and exam- 
ined what we believe to be drawbacks in its customary 
modeUng assumptions. These are the absence of an un- 
derlying structure separating the mutations that can oc- 
cur from those that cannot; the lack of a general frame- 
work within which a genotype's loci can be sorted into 
different susceptibilities to undergo mutations; and fi- 
nally, a methodology to explain the degeneracy of the 
wild type, when mutations are excessively too frequent, 
that implies a brusque transition from the regime in 
which it survives. Our approach to tackle these issues 
has been, respectively, to model the mutational interac- 
tions among genotypes as a random graph; to adopt real- 
valued susceptibilities that influence both the graph's 
structure and the dynamics of the population; and to pos- 
tulate a specific functional dependency of a genotype's 
relative abundance on its fitness at equilibrium. The re- 
sulting model has a probability, p, as its single parameter. 
Increasing p makes the graph denser and allows more mu- 
tations as the population evolves toward the quasispecies. 

It is important to note that our model does not merely 
generalize the common approach of assuming that graph 
D has an edge directed from any genotype to any other 



and that any locus in a genotype is equally susceptible 
to undergo a mutation at the same point rate u. Even 
though in the two models it is sometimes possible to write 
the mutation probability qij of genotype i into genotype j 
as very similar products over all L loci [in the customary 
approach we have = u^'j (1 — w)^"^*^ ; in our model, 
assuming for example the uniform case, we have qij = 

{p/zl'^)"^' {l/zl'^)^-"% the similarity between them 
can be carried no further. In fact, setting p = 1 in our 
model to ensure that D is always fully connected yields 
qij — 1/n = 0.5^ regardless of i or j, which does not 
conform with the usual approach unless u = 0.5. The 
bottom line is that substantial further studies are needed 
to determine whether characteristic values of p exist for 
as many organisms as possible, much as has been done 
for the rate u (cf., e.g., fisf). 

Our results were given for the nontrivial fitness land- 
scape in which a genotype's fitness decays exponentially 
with its Hamming distance to the wild type. They have 
also been based on two specific susceptibility scenarios 
and a power-law relationship between a genotype's rela- 
tive abundance in the quasispecies and its fitness. While 
the latter is widely accurate only for one of the suscepti- 
bility scenarios (the uniform case), overall our modeling 
choices have led to useful analytical predictions of both 
the several genotypes' participation in the quasispecies 
and the wild type's transition from survival to degener- 
acy as p increases. 

As with other variations of the quasispecies theory, the 
modifications we have introduced all corroborate the the- 
ory's central idea, viz. that selection and mutation act 
on the entire ensemble of genotypes. They also corrob- 
orate the crucial role of the error-related parameter (p, 
in our case) in separating two distinct regimes, one in 
which the quasispecies adapts to the fitness landscape, 
the other in which it becomes degenerate. It remains to 
be seen whether the same will continue to hold as alter- 
native fitness landscapes and variations of the remaining 
assumptions are studied. 
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